Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 7.166
1.
Methods Mol Biol ; 2744: 391-402, 2024.
Article En | MEDLINE | ID: mdl-38683333

This chapter describes procedures for the use of DNA sequence data to obtain and compare taxonomic identification using the public databases GenBank and Barcode of Life Data System (BOLD). The chapter begins by describing procedures used to prepare quality sequences for uploading into GenBank and BOLD. Next, steps used to query the DNA sequences against the public databases are described using GenBank BLAST and BOLD identification engines. Interpretation guidelines for the taxonomic identification assignments are presented. Finally, a procedure for evaluating the accuracy and reliability of sequences from GenBank and BOLD is provided.


DNA Barcoding, Taxonomic , Databases, Nucleic Acid , DNA Barcoding, Taxonomic/methods , Computational Biology/methods , Sequence Analysis, DNA/methods , Databases, Genetic , Software
2.
Methods Mol Biol ; 2744: 475-489, 2024.
Article En | MEDLINE | ID: mdl-38683336

The MetaZooGene Atlas and Database (MZGdb; https://metazoogene.org/mzgdb/ ) is an open-access data and metadata portal synchronized with the NCBI GenBank and BOLD data repositories. The MZGdb includes sequences for genes used for the classification and identification of marine organisms based on DNA barcoding and metabarcoding. The focus of the MZGdb is biodiversity of marine ecosystems, including phytoplankton and microbes, zooplankton and invertebrates, fish, and other marine vertebrates (pinnipeds, cetaceans, and sea turtles). DNA sequences currently included are mitochondrial cytochrome oxidase I (COI), 12S, and 16S rRNA, and nuclear 18S and 28S rRNA. The MZGdb provides data and mapping tools for assembling and downloading compilations of reference sequence data that are specific to selected genes, taxonomic groups, and/or ocean regions. An additional feature of the MZGdb is the Atlas which summarizes data coverage and proportional completeness based on statistics of species with available sequences versus species commonly found in each ocean region.This chapter is a collaborative effort of the Scientific Committee for Ocean Research (SCOR) Working Group WG157: MetaZooGene: Toward a new global view of marine zooplankton biodiversity based on DNA metabarcoding and reference DNA sequence databases ( https://metazoogene.org ).


Aquatic Organisms , Biodiversity , DNA Barcoding, Taxonomic , Animals , Aquatic Organisms/genetics , Aquatic Organisms/classification , DNA Barcoding, Taxonomic/methods , Ecosystem , Databases, Genetic , Databases, Nucleic Acid
3.
Comput Biol Med ; 172: 108256, 2024 Apr.
Article En | MEDLINE | ID: mdl-38489989

Sepsis, a life-threatening condition triggered by the body's response to infection, presents a significant global healthcare challenge characterized by disarrayed host responses, widespread inflammation, organ impairment, and heightened mortality rates. This study introduces the ncRS database (http://www.ncrdb.cn), a meticulously curated repository housing 1144 experimentally validated non-coding RNAs (ncRNAs) intricately linked with sepsis. ncRS offers comprehensive RNA data, exhaustive experimental insights, and integrated annotations from diverse databases. This resource empowers researchers and clinicians to decipher ncRNAs' roles in sepsis pathogenesis, potentially identifying vital biomarkers for early diagnosis and prognosis, thus facilitating personalized treatments.


RNA, Untranslated , Sepsis , Humans , RNA, Untranslated/genetics , Databases, Nucleic Acid , Biomarkers , Sepsis/genetics
4.
Front Immunol ; 15: 1267963, 2024.
Article En | MEDLINE | ID: mdl-38464509

Background: Coronary artery disease (CAD) and type 2 diabetes mellitus (T2DM) are closely related. The function of immunocytes in the pathogenesis of CAD and T2DM has not been extensively studied. The quantitative bioinformatics analysis of the public RNA sequencing database was applied to study the key genes that mediate both CAD and T2DM. The biological characteristics of associated key genes and mechanism of CD8+ T and NK cells in CAD and T2DM are our research focus. Methods: With expression profiles of GSE66360 and GSE78721 from the Gene Expression Omnibus (GEO) database, we identified core modules associated with gene co-expression relationships and up-regulated genes in CAD and T2DM using Weighted Gene Co-expression Network Analysis (WGCNA) and the 'limma' software package. The enriched pathways of the candidate hub genes were then explored using GO, KEGG and GSEA in conjunction with the immune gene set (from the MSigDB database). A diagnostic model was constructed using logistic regression analysis composed of candidate hub genes in CAD and T2DM. Univariate Cox regression analysis revealed hazard ratios (HRs), 95% confidence intervals (CIs), and p-values for candidate hub genes in diagnostic model, while CIBERSORT and immune infiltration were used to assess the immune microenvironment. Finally, monocytes from peripheral blood samples and their immune cell ratios were analyzed by flow cytometry to validate our findings. Results: Sixteen candidate hub genes were identified as being correlated with immune infiltration. Univariate Cox regression analysis revealed that NPEPPS and ABHD17A were highly correlated with the diagnosis of CAD and T2DM. The results indicate that CD8+ T cells (p = 0.04) and NKbright cells (p = 3.7e-3) are significantly higher in healthy controls than in individuals with CAD or CAD combined with T2DM. The bioinformatics results on immune infiltration were well validated by flow cytometry. Conclusions: A series of bioinformatics studies have shown ABHD17A and NPEPPS as key genes for the co-occurrence of CAD and T2DM. Our study highlights the important effect of CD8+ T and NK cells in the pathogenesis of both diseases, indicating that they may serve as viable targets for diagnosis and therapeutic intervention.


Coronary Artery Disease , Diabetes Mellitus, Type 2 , Humans , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 2/genetics , Up-Regulation , CD8-Positive T-Lymphocytes , Killer Cells, Natural , Databases, Nucleic Acid
5.
J Microbiol Methods ; 220: 106921, 2024 May.
Article En | MEDLINE | ID: mdl-38494090

Bacteria are primarily responsible for biological water treatment processes in constructed wetland systems. Gravel in constructed wetlands serves as an essential substrate onto which complex bacterial biofilms may successfully grow and evolve. To fully understand the bacterial community in these systems it is crucial to properly isolate biofilms and process DNA from such substrates. This study looked at how best to isolate bacterial biofilms from gravel substrates in terms of bacterial richness. It considered factors including the duration of agitation during extraction, extraction temperature, and enzyme usage. Further, the 16S taxonomy data subsequently produced from Illumina MiSeq reads (using the SILVA 132 ribosomal RNA (rRNA) database on the DADA2 pipeline) were compared with the 16S data produced from Oxford Nanopore Technologies (ONT) MinION reads (using the NCBI 16S database on the EPI2ME pipeline). Finally, performance was tested by comparing the taxonomy data generated from the Illumina MiSeq and ONT MinION reads using the same (SILVA 132) database. We found no significant differences in the effective number of species observed when using different bacterial biofilm detachment techniques. However, enzyme treatment enhanced the total concentration of DNA. In terms of wetland community profiles, relative abundance differences within each sample type were clearer at the genus level. For genus-level taxonomic classification, MinION sequencing with the EPI2ME pipeline (NCBI database) produced bacterial abundance information that was poorly correlated with that from the Illumina MiSeq and DADA2 pipelines (SILVA132 database). When using the same database for each sequencing technology (SILVA132), the correlation between relative abundances at genus-level improved from negligible to moderate. This study provides detailed information of value to researchers working on constructed wetlands regarding efficient biofilm detachment techniques for DNA isolation and 16 s metabarcoding platforms for sequencing and data analysis.


Databases, Nucleic Acid , High-Throughput Nucleotide Sequencing , RNA, Ribosomal, 16S/genetics , Genes, rRNA , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Bacteria/genetics
6.
Zootaxa ; 5406(2): 238-252, 2024 Feb 05.
Article En | MEDLINE | ID: mdl-38480154

Eupyrochroa Blair, 1914 is a small genus of fire-colored beetles (Coleoptera: Pyrochroidae) with two putative species recorded from limited historical distributions in China. The two species, E. insignita (Fairmaire, 1894) and E. limbaticollis (Pic, 1909), have been distinguished on the basis of color differences in the pronotum and scutellum, characters now known to exhibit significant variability. In the present study, adult morphology of the two species was compared, and partial fragments of cytochrome c oxidase subunit I (COI) from 36 samples representing 14 pyrochroid species were obtained by extraction and a GenBank search. Nucleotide composition, genetic distance, and phylogeny were analyzed. The results of morphological and molecular analyses indicate consistency, suggesting that the two species are indistinguishable by any significant measure. Therefore, Eupyrochroa limbaticollis (Pic, 1909) is proposed as a junior synonym of E. insignita (Fairmaire, 1894). The species is also redescribed and illustrated, including both adults and larvae.


Coleoptera , Animals , Phylogeny , Larva , Databases, Nucleic Acid
7.
Genome Biol ; 25(1): 60, 2024 Feb 26.
Article En | MEDLINE | ID: mdl-38409096

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .


Databases, Nucleic Acid , Genome , Software
8.
mSystems ; 9(3): e0110523, 2024 Mar 19.
Article En | MEDLINE | ID: mdl-38376167

Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, Padj ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for in silico viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE: The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of in silico viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.


Benchmarking , Viruses , Ecosystem , Metagenomics/methods , Databases, Nucleic Acid
9.
Nat Comput Sci ; 4(2): 104-109, 2024 Feb.
Article En | MEDLINE | ID: mdl-38413777

Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.


Genomics , Seawater , Oceans and Seas , Metagenome/genetics , Databases, Nucleic Acid
10.
BMC Res Notes ; 17(1): 35, 2024 Jan 24.
Article En | MEDLINE | ID: mdl-38268047

OBJECTIVE: A reliable taxonomic identification of species from molecular samples is the first step for many studies. For researchers unfamiliar with programming, running a BLAST analysis, filtering, and organizing results for hundreds of sequences through the BLAST web interface can be difficult. Additionally, sequences deposited in GenBank can have outdated taxonomic identification. The use of reliable Reference Sequences Library (RSL) containing accurate taxonomically-identified sequences facilitates this task. Pending the availability of a RSL with the user, we developed a tool that automates the molecular taxonomic identification of sequences. RESULTS: We developed PARSID, a Python script running through the command-line that automates the routine workflow of blasting an input sequence file against the user's RSL, and retrieves the matches with the highest percentage of identity in five steps. PARSID accepts cut-off parameters and supplementary information in a.csv file for filtering the results. The final output is visualized in a spreadsheet. We tested its functioning using 10 input sequences simulating different situations of the molecular taxonomic identification of sequences against an example RSL containing 25 sequences. Step-by-step instructions and test files are publicly available at https://github.com/kokinide/PARSID.git .


Databases, Nucleic Acid , Publications , Humans , Gene Library , Research Personnel , Workflow
11.
Nucleic Acids Res ; 52(4): 1628-1644, 2024 Feb 28.
Article En | MEDLINE | ID: mdl-38261968

A growing body of evidence indicates an important role of miRNAs in cancer; however, there is no definitive, convenient-to-use list of cancer-related miRNAs or miRNA genes that may serve as a reference for analyses of miRNAs in cancer. To this end, we created a list of 165 cancer-related miRNA genes called the Cancer miRNA Census (CMC). The list is based on a score, built on various types of functional and genetic evidence for the role of particular miRNAs in cancer, e.g. miRNA-cancer associations reported in databases, associations of miRNAs with cancer hallmarks, or signals of positive selection of genetic alterations in cancer. The presence of well-recognized cancer-related miRNA genes, such as MIR21, MIR155, MIR15A, MIR17 or MIRLET7s, at the top of the CMC ranking directly confirms the accuracy and robustness of the list. Additionally, to verify and indicate the reliability of CMC, we performed a validation of criteria used to build CMC, comparison of CMC with various cancer data (publications and databases), and enrichment analyses of biological pathways and processes such as Gene Ontology or DisGeNET. All validation steps showed a strong association of CMC with cancer/cancer-related processes confirming its usefulness as a reference list of miRNA genes associated with cancer.


Databases, Nucleic Acid , MicroRNAs , Neoplasms , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Neoplasms/genetics , Reproducibility of Results
12.
Infect Genet Evol ; 118: 105557, 2024 Mar.
Article En | MEDLINE | ID: mdl-38244748

Human infections with Rocahepevirus ratti genotype C1 (HEV-C1) in Hong Kong of China, Canada, Spain, and France have drawn worldwide concern towards Rocahepevirus. This study conducted a global genetic analysis of Rocahepevirus, aiming to furnish comprehensive molecular insights and promote further research. We retrieved 817 Rocahepevirus sequences from the GenBank database through October 31, 2023, categorizing them according to research, sample collection area and date, genotype, host, and sequence length. Subsequently, we conducted descriptive epidemiological, phylogenetic evolutionary, and protein polymorphism (in length and identity) analyses on these sequences. Rocahepevirus genomes were identified across twenty-eight countries, predominantly in Asia (71.73%, 586/817) and Europe (26.44%, 216/817). The HEV-C1 dominates Rocahepevirus (77.2%, 631/817), while newly discovered Rocahepevirus genotypes (C3/C4/C5 and other unclassified genotypes) were primarily identified in Europe (25/120) and China (91/120). Muridae animals (72.5%, 592/817) serve as the primary hosts for Rocahepevirus, with other hosts encompassing species from the families Soricidae, Hominidae, Mustelidae, and Cricetidae. Additionally, Rocahepevirus genomes (C1 genotype) were identified in sewage samples recently. The phylogenetic evolution of Rocahepevirus exhibits considerable variation. Specifically, HEV-C1 can be classified into at least six genetic groups (G1 to G6), with human HEV-C1 distributed across multiple evolutionary clades. The overall ORF1 and ORF2 amino acid sequence lengths were significantly different (P < 0.001) across Rocahepevirus genotypes. HEV-C1/C2/C3 and HEV-C4/C5 displayed substantial differences in amino acid sequence identity (58.4%-59.6%). The identification of Rocahepevirus genomes has expanded across numerous countries, particularly in European and Asian countries, coinciding with an expanding host range and emergence of new genotypes. The evolutionary path of Rocahepevirus is intricate, where the HEV-C1 dominates globally and internally forms multiple evolutionary groups (G1 to G6), exhibiting diverse genetic variation within human HEV-C1. Significant differences exist in the protein polymorphism (in length and identity) across Rocahepevirus genotypes. Given Rocahepevirus's shift from an animal virus to a zoonotic pathogen, worldwide cooperation in monitoring Rocahepevirus genomes is vital.


Mustelidae , Viruses , Humans , Animals , Phylogeny , Molecular Epidemiology , Arvicolinae , Databases, Nucleic Acid , Hong Kong , Muridae
13.
Vet Parasitol Reg Stud Reports ; 47: 100962, 2024 01.
Article En | MEDLINE | ID: mdl-38199700

This study reports the infection and diagnosis of the protozoan morphologic complex Trichomonas gallinae in a baby red-breasted toucan (Ramphastos dicolorus). Nodular lesions on the soft palate and edema in the oral cavity were observed macroscopically. Microscopically, a granuloma with multiple layers of necrosis interspersed with inflammatory polymorphonuclear infiltrates was observed. Parasitism was confirmed by parasitological diagnosis, isolation of the flagellates in culture medium, and Polymerase Chain Reaction (PCR) using 5.8S ribosomal RNA (rRNA). Flanking internal transcribed spacer (ITS) gene regions were amplified by polymerase chain reaction, and the sequences were analyzed phylogenetically using MEGA 11 software. Phylogenetic analysis based on ITS1/5.8S rRNA/ITS2 sequences demonstrated high nucleotide identity with two Trichomonas sequences available in GenBank, which were more closely related to T. vaginalis (99%) than to T. gallinae (98%). In addition to being potential transmitters of this protozoan, rigorous monitoring of infectious and parasitic diseases in wild bird populations is essential for their preservation. The forms of transmission of Trichomonas sp. favor the occurrence of the disease in many non-Columbiformes species, which is essential for the monitoring of this disease in wild birds.


Trichomonas Infections , Trichomonas , Animals , Phylogeny , Trichomonas Infections/diagnosis , Trichomonas Infections/veterinary , Trichomonas/genetics , Birds , Databases, Nucleic Acid
14.
Database (Oxford) ; 20242024 Jan 29.
Article En | MEDLINE | ID: mdl-38284937

Insect decline has become a growing concern in recent years, with studies showing alarming declines in populations of several taxa. Our knowledge about genetic spatial patterns and evolutionary history of insects still exhibits significant gaps hindering our ability to effectively conserve and manage insect populations and species. Genetic data may provide valuable insights into the diversity and the evolutionary relationships of insects' species and populations. Public repositories, such as GenBank and BOLD, containing vast archives of genetic data with associated metadata, offer an irreplaceable resource for researchers contributing to our understanding of species diversity, population structure and evolutionary relationships. However, there are some issues in using these data, as they are often scattered and may lack accuracy due to inconsistent sampling protocols and incomplete information. In this paper we describe a curated georeferenced database of genetic data collected in GenBank and BOLD, for insects listed in the International Union for Conservation of Nature (IUCN) Italian Red Lists (dragonflies, bees, saproxylic beetles and butterflies). After querying these repositories, we performed quality control and data standardization steps. We created a dataset containing approximately 33 000 mitochondrial sequences and associated metadata about taxonomy, collection localities, geographic coordinates and IUCN Red List status for 1466 species across the four insect lists. We describe the current state of geographical metadata in queried repositories for species listed under different conservation status in the Italian Red Lists to quantify data gaps posing barriers to prioritization of conservation actions. Our curated dataset is available for data repurposing and analysis, enabling researchers to conduct comparative studies. We emphasize the importance of filling knowledge gaps in insect diversity and distribution and highlight the potential of this dataset for promoting other research fields like phylogeography, macrogenetics and conservation strategies. Our database can be downloaded through the Zenodo repository in SQL format. Database URL:  https://zenodo.org/records/8375181.


Butterflies , Odonata , Bees , Animals , Humans , Insecta/genetics , Databases, Nucleic Acid , Geography
15.
Nucleic Acids Res ; 52(D1): D1-D9, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-38035367

The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven. Nucleic acid databases include the new NAKB for structural information and updates from Genbank, ENA, GEO, Tarbase and JASPAR. The Issue's Breakthrough Article concerns NMPFamsDB for novel prokaryotic protein families and the AlphaFold Protein Structure Database has an important update. Metabolism is covered by updates from Reactome, Wikipathways and Metabolights. Microbes are covered by RefSeq, UNITE, SPIRE and P10K; viruses by ViralZone and PhageScope. Medically-oriented databases include the familiar COSMIC, Drugbank and TTD. Genomics-related resources include Ensembl, UCSC Genome Browser and Monarch. New arrivals cover plant imaging (OPIA and PlantPAD) and crop plants (SoyMD, TCOD and CropGS-Hub). The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Over the last year the NAR online Molecular Biology Database Collection has been updated, reviewing 1060 entries, adding 97 new resources and eliminating 388 discontinued URLs bringing the current total to 1959 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.


Computational Biology , Databases, Nucleic Acid , Databases, Genetic , Databases, Nucleic Acid/trends , Genomics , Internet , Molecular Biology/trends
16.
Nucleic Acids Res ; 52(D1): D239-D244, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-38015436

The MODOMICS database was updated with recent data and now includes new data types related to RNA modifications. Changes to the database include an expanded modification catalog, encompassing both natural and synthetic residues identified in RNA structures. This addition aids in representing RNA sequences from the RCSB PDB database more effectively. To manage the increased number of modifications, adjustments to the nomenclature system were made. Updates in the RNA sequences section include the addition of new sequences and the reintroduction of sequence alignments for tRNAs and rRNAs. The protein section was updated and connected to structures from the RCSB PDB database and predictions by AlphaFold. MODOMICS now includes a data annotation system, with 'Evidence' and 'Estimated Reliability' features, offering clarity on data support and accuracy. This system is open to all MODOMICS entries, enhancing the accuracy of RNA modification data representation. MODOMICS is available at https://iimcb.genesilico.pl/modomics/.


Databases, Nucleic Acid , RNA , Databases, Protein , RNA/chemistry , RNA/genetics , Internet , Sequence Analysis, RNA , User-Computer Interface
17.
Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-38015445

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.


Academies and Institutes , Computational Biology , Computational Biology/organization & administration , Computational Biology/trends , Academies and Institutes/organization & administration , Academies and Institutes/trends , Databases, Nucleic Acid , Europe
18.
Nucleic Acids Res ; 52(D1): D92-D97, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-37956313

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.


Genomics , Nucleotides , Computational Biology , Databases, Nucleic Acid , Internet , Reproducibility of Results , Europe
19.
Nucleic Acids Res ; 52(D1): D52-D60, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-37739414

Recent studies have demonstrated the important regulatory role of circRNAs, but an in-depth understanding of the comprehensive landscape of circRNAs across various species still remains unexplored. The current circRNA databases are often species-restricted or based on outdated datasets. To address this challenge, we have developed the circAtlas 3.0 database, which contains a rich collection of 2674 circRNA sequencing datasets, curated to delineate the landscape of circRNAs within 33 distinct tissues spanning 10 vertebrate species. Notably, circAtlas 3.0 represents a substantial advancement over its precursor, circAtlas 2.0, with the number of cataloged circRNAs escalating from 1 007 087 to 3 179 560, with 2 527 528 of them being reconstructed into full-length isoforms. circAtlas 3.0 also introduces several notable enhancements, including: (i) integration of both Illumina and Nanopore sequencing datasets to detect circRNAs of extended lengths; (ii) employment of a standardized nomenclature scheme for circRNAs, providing information of the host gene and full-length circular exons; (iii) inclusion of clinical cancer samples to explore the biological function of circRNAs within the context of cancer and (iv) links to other useful resources to enable user-friendly analysis of target circRNAs. The updated circAtlas 3.0 provides an important platform for exploring the evolution and biological implications of vertebrate circRNAs, and is freely available at http://circatlas.biols.ac.cn and https://ngdc.cncb.ac.cn/circatlas.


Databases, Nucleic Acid , Neoplasms , RNA, Circular , Animals , Humans , Neoplasms/genetics , Vertebrates/genetics
20.
Nucleic Acids Res ; 52(D1): D1327-D1332, 2024 Jan 05.
Article En | MEDLINE | ID: mdl-37650649

MicroRNAs (miRNAs) are a class of important small non-coding RNAs with critical molecular functions in almost all biological processes, and thus, they play important roles in disease diagnosis and therapy. Human MicroRNA Disease Database (HMDD) represents an important and comprehensive resource for biomedical researchers in miRNA-related medicine. Here, we introduce HMDD v4.0, which curates 53530 miRNA-disease association entries from literatures. In comparison to HMDD v3.0 released five years ago, HMDD v4.0 contains 1.5 times more entries. In addition, some new categories have been curated, including exosomal miRNAs implicated in diseases, virus-encoded miRNAs involved in human diseases, and entries containing miRNA-circRNA interactions. We also curated sex-biased miRNAs in diseases. Furthermore, in a case study, disease similarity analysis successfully revealed that sex-biased miRNAs related to developmental anomalies are associated with a number of human diseases with sex bias. HMDD can be freely visited at http://www.cuilab.cn/hmdd.


Databases, Nucleic Acid , Disease , MicroRNAs , Humans , MicroRNAs/genetics , Disease/genetics
...